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SECTION  I 
INTRODUCTION 


A.  OVERVIEW 

Rapid  advances  have  been  made  during  the  past  several  years  in  large-scale  integrated  circuit 
(LSIO  technology.  These  advances  have  had  a  significant  impact  on  many  military  signal  processing 
functions  found  in  such  applications  as  forward-looking  infrared  (I-'LIR)  radar,  guidance  and  control, 
and  ECM  systems.  In  particular,  image  processing  system  studies  for  video  bandwidth  reduction. 
FL1R  automatic  cueing,  3-D  target  classification,  and  image  understanding  have  consistently  recom¬ 
mended  using  LSIC  technologies  to  perform  critical  image  processing  functions.  A  general-purpose 
algorithm'  which  uses  the  linear  operation 


M 


on  a  single  video  line  or  on  an  n  by  n  block  of  picture  elements  is  an  ideal  candidate  to  be  imple¬ 
mented  with  LSIC  technologies. 

While  such  algorithms  can  be  executed  easily  at  low  data  rates  using  general-purpose  mini¬ 
computers  or  even  commercial  microprocessors,  it  is  usually  not  possible  to  execute  them  in  real 
time  in  an  airborne  environment  because  of  excessive  size,  weight,  power  dissipation,  and  cost.  I  he 
key  to  effective  system  design  is  to  apply  LSIC'  technology  to  minimize  the  overall  component 
count  and  variety  of  components  while  absorbing  as  much  as  possible  of  the  control  and  timing 
logic  onto  the  information  processing  chip  themselves.  The  solution  is  optimum  when  the  same 
chips  can  be  used  for  a  multitude  of  other  applications  to  provide  a  high  volume  market.  These 
requirements  have  lead  to  the  desire  for  a  programmable  chip  architecture,  and  in  turn,  to  the  con¬ 
cept  of  a  parallel/serial  input  bus.  The  discovery  of  the  read-only  memory  (ROM)  accumulate1- ,,J 
algorithm  to  implement  the  above-mentioned  linear  operator  could  have  significant  impact  on  image 
processing  systems. 

B.  OBJECTIVE 

The  objective  of  this  12-month  exploratory  development  effort  was  to  develop  a  general- 
purpose  digital  LSIC  device  called  the  programmable  image  processing  element  (PIPE).  which  can 
be  programmed  to  perform  a  variety  of  signal  processing  functions  such  as  Cosine  and  lladamard 
transforms,  edge  extraction,  unsharp  masking,  pole-zero  filtering,  and  signal  smoothing  on  S  by  1 
or  3  by  3  element  data  blocks  at  full  TV  data  rates  of  10  megasamples  per  second.  Such  a  device 
should  find  widespread  application  in  anti-jam  video  data  links,  I-'LIR  automatic  cuers.  target 
classifiers,  and  digital  filter  processors.  Hence,  the  PIPE  LSIC  is  suited  for  those  airborne  applica¬ 
tions  where  data  rate,  size,  power  and  weight  restrictions  prohibit  the  use  of  general-purpose 
microprocessors  or  high-speed  digital  multipliers.  This  contract  will  address  only  the  design  and 
photomask  fabrication  of  the  PIPE  LSIC. 

C.  SUMMARY 

During  the  design  phase  of  this  contract,  investigations  of  ROM  technologies  and  designs  to 
ensure  a  user-oriented  image  processing  LSIC  were  conducted.  I  he  results  of  this  investigation  are 


detailed  in  the  IMP!  LSK  design  discussion  given  in  Subsection  II. A.  A  brief  summary  of  the  P1PL 
L  SIC  follows. 

Texas  Instruments  was  particularly  well  qualified  to  execute  this  contract  successfully 
because  of  its  low  risk  implementation  approach.  This  low  risk  implementation  approach  is  a 
result  of: 

•  Tser-oriented  PIPL  1  SIC'  architecture 

•  hstablished.  cost-effective  LSIC  technology 

•  Proven  erasable,  programmable  ROM  design 

•  Breadboard  emulator  of  PIPL  LSIC. 

Several  architectures  were  investigated  as  candidates  for  implementing  the  PlPl:  LSIC.  The 
architecture  selected  maximizes  flexibility  in  algorithm  implementation  and  minimizes  external 
control  and  timing  logic. 

I  lie  N -channel  inctal-oxidc-seniicoiuluctor  (NMOS)  technology  was  selected  to  implement 
the  PIPL  LSIC  because  it  is  an  established,  cost-effective  technology  capable  of  providing  the  high- 
circuit  density,  low  speed-power  product,  and  user  erasable  programmable  read-only  memory 
1 1  PROM)  needed  for  the  user-oriented  PIPL  LSIC  architecture. 

The  NMOS  erasable  programmable  read-only  memory  was  judged  to  be  the  best  technology 
to  meet  the  goals  of  the  PI  PI  I  SIC.  The  advantages  of  NMOS  LPROM  technology  are: 

•  User-programmable  (electrically) 

•  Lrasable  (ultraviolet  light) 

•  Nonvolatile 

•  Single  supply  voltage  operation 

•  Static  on-chip  NMOS  logic 

•  Military  grade 

•  established  technology. 

Lexas  Instruments  has  available  in  production  quantities  a  family  of  military -grade  LPROMs 
ranging  in  size  from  8K  ( IK  =  1024  bits)  to  32K.  This,  along  with  the  features  of  LPROMs,  makes 
the  LPROM  technology  ideal  for  the  PIPL  LSIC  program. 

The  PIPL  LSIC  design  completed  during  this  contract  is  shown  in  Figure  1.  As  illustrated  in 
this  figure,  there  are  five  major  sections  of  this  integrated  circuit : 

•  Input  latch  parallel-to-serial  shift  register 

•  LPROM 

•  Shift  -aiul-accumulatc 

•  Tri-state  output  latch 

•  Controller. 
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Figure  I  PIPE  LSIC  Layout 


I’.aeh  of  these  functions  is  discussed  in  detail  in  Subsections  II. A. 2  through  ll.A.b.  respec¬ 
tively.  Hie  total  bar  si/e  is  approximate^  240  In  27()  mil ' .  with  ;i  total  estimated  power  of  O00 
milliwatts. 

lexas  Instruments  believes  the  proposed  Phase  II  PIPE  program  is  very  relevant  and  timely 
for  the  development  of  a  programmable  image  processing  element  and  1ms  carefully  considered  the 
applications  of  the  PIPI  LSIC.  I  he  PIPE  I  S1C  is  ideal  for  4  In  4  window  operations  and  8  by  1  or 
'*  by  I  matrix  operations.  A  more  detailed  discussion  of  the  applications  of  the  PIPE.  LSIC  is  pre¬ 
sented  in  Subsection  I  LB. 


SECTION  II 

TECHNICAL  DISCUSSION 


A.  PIPE  LSIC  DESIGN 

I'exas  Instruments  has  designed  a  programmable  image-pmeessing-element  (Rll'Fi  large-scale 
integrated  circuit  (LSIC)  during  Phase  I  of  the  PIPI  program,  contract  F33(>  1  5-70-C-l  7f>3.  Details 
of  this  design  are  discussed  in  this  section. 

I  Introduction 

Manx,  image-processing  algorithms1  require  a  sum  of  products  operator  of  the  form 


M 


i-  1 


where  the  represents  a  set  of  fixed  programmable  weighting  coefficients.  Xj  represents  a  set  ni 
sc(|uence  of  input  values,  and  M  represents  the  number  of  inputs.  This  mathematical  function  car. 
be  used  to  calculate  the  coefficients  of  various  transforms  used  in  main  image-processing  applica¬ 
tions  such  as  Fourier.  Cosine.  Iladamard.  Ilaar.  and  others,  flic  sum  of  products  expression  can 
also  be  used  in  determining  many  neighborhood  operators  which  perform  such  operations  as  noise 
smoothing,  edge  enhancement,  and  edge  crispening.  For  many  image-processing  applications  such 
as  video  bandwidth  reduction,  forward-looking  infrared  (IT.IRi  autoeueing.  target  classification, 
and  image  understanding,  algorithms  based  on  the  sum  of  products  operator  are  required. 

The  sum  of  products  operation  of  I  -.qua  lion  I  can  be  implemented  using  digital  integrated 
circuit  (1C  )  multipliers  and  an  accumulator,  as  shown  in  Figure  2.  However,  the  si/e  and  power 
required  to  perform  the  multiplications  with  digital  1C  multipliers  at  video  data  rates  are  prohibitive 
for  mans  airborne  image-processing  applications.  Hence,  investigations  have  been  performed 
recently  on  techniques  for  the  realization  of  the  sum  of  products  operation  without  using  digital 
multipliers. :-'-4  I  hese  distributed  arithmetic  techniques  are  a  table  look-up  procedure  to  perform 
the  multiplication  in  I  quation  I.  This  table  look-up  operation  replaces  the  digital  multipliers  with  a 
read-only -memory  (ROM)  function.  I  he  principle  of  operation  of  the  ROM-accumulute  algorithms 
is  discussed  below . 

As  noted  in  equation  I.  the  weighting  coefficients.  \V  are  fixed  and  known  while  the  input 
words.  X  are  variable.  In  binary  arithmetic,  the  input  variable.  X  .  can  be  expressed  as 

'  E  '  ,;i 

i  <l 

whore  the  term  o(j  is  I  he  binary  value  in  each  “diizit"  position:  that  is. 

v  IM 


B>  Mibstilutini!  I  quatioit  2  into  i-.quation  1 ,  the  following  expression  is  obtained 
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bracketed  term  can  be  calculated  tor  all 
possible  combinations  and  stored  in  an  ROM. 

Iherelore.  the  particular  iv  from  the 
inciiitiiiir1  data  can  be  used  to  address  the 
ROM  and  letch,  the  associated  value  of  the 
bracketed  term.  Once  the  particular  value  of 
the  bracketed  term  is  obtaineil.  liquation  4 
can  be  computed  b\  shifting.’  this  bracketed 
lei m  b\  one  bit  position  before  accumulating! 
the  pre\ ions  sum.  A  block  diagram  of  this 
ROM-accumulate  algorithm  is  shown  in 
figure  3  Note  that  the  X,  terms  are  bit-serial 
due  to  the  rearrangement  of  terms  in 
I  quation  4  I  he  si/e  ol  the  ROM  required 

m  figure  3  is  2M  words,  where  M  is  the  maximum  number  ol  input  values,  as  given  in  f.quation  1. 
1  he  number  of  bits  pet  word  in  the  memory  is  the  sum  of  the  word  length  of  W,  and  the  value  ot 
loe  M . 
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Figure  2.  Block  Diagram  of  the  Multiplier- 
Accumulator  Algorithm  for  Implementing  Sum 
of  Products  Operator 


\s  noted  m  the  above  discussion,  the  input  data  to  the  ROM  is  bit-serial  However,  in  most 
applications,  the  input  data  in  a  typical  image-processing  application  is  given  in  a  parallel  word 
format.  Hence,  the  block  diagram  of  f  igure  3  is  modified  to  change  the  input  data  from  parallel 
lormat  to  hit-serial  format,  as  shown  lit  figure  4.  This  input  data  can  be  reformatted  with  a  parallel- 
to-serial  slid1  register  circuit  Also  shown  in  figure  4  is  an  output  buffer,  which  buffers  the  outputs 
of  the  ml l-aiid-aeeumulale  function,  and  a  controller  function,  which  provides  the  necessary 
controls  to  the  other  tunc  I  ions.  A  brief  discussion  of  the  design  ol  the  I’l  I’l .  i  SIC  tollows. 

I  he  lollowmg  design  goals  were  used  in  the  development  ol  the  I’ll’l  l.SIC.  1  he  IMPI 
I  SIC  will  accept  input  data  being  '  b  bits  at  a  20-MI  1/  data  rate.  Data  entry  can  be  in  3  >  3  blocks 
oi  'i  ■  I  bio,  ks  defined  l>\  the  user.  Data  is  entered  via  an  input  strobe  at  a  20-MI  I /  rate  Output 
data  is  presented  on  a  In  state  bus  being  *320  tuts  parallel.  Output  timing  is  compatible  with  the 
input  allowing  i  lev  lie  <  hauling.  All  inputs  and  outputs  are  III  -  and  CMOS-compatible  with  a 
lanoui  ol  tluce  \  master  cloi  k  controls  all  data  sequencing  and  timing.  The  l’i I’l  I  SIC'  will  be 
iisci-pi  t  vgi  .mim.ibli 
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Figure  3 .  Block  Diagram  of  the 
ROM- Accumulator  Algorithm  for 
Implementing  Sum  of  Products  Operator 


The  simplified  block  diagram  of  the 
PIPH  LS1C  architecture  is  shown  in  Figure  5. 
As  shown  in  Figure  5.  the  input  data  S.  T.  and 
P  arc  parallel  words  that  can  be  loaded  into 
the  input  latches  serially  or  in  parallel.  The 
particular  mode  of  operation  is  controlled  by 
the  user  through  the  use  of  one  of  the  control 
input  lines.  This  allows  the  PIPF  LSIC  to 
operate  on  9  X  1  blocks  or  3X3  blocks  of 


.TV 


Figure  4.  Block  Diagram  of  ROM-Accumulator 
Algorithm  With  Parallel-to-Serial  Shift  Register, 
Buffer  Output,  and  Controller 


data.  In  the  serial  mode,  all  data  is  loaded 
through  the  P  input  pins  and  latch,  and  then 

is  sequentially  clocked  through  the  other  latches.  In  the  parallel  mode,  the  data  is  loaded  through 
the  S.  T.  and  P  input  pins  into  three  separate  input  latches.  The  input  data  is  then  sequentially 
clocked  into  the  other  latches.  In  the  serial  mode,  nine  sample  periods  are  required  to  load  all  of  the 
input  latches;  in  the  parallel  mode,  three  sample  periods  are  required.  This  method  of  data  entr\ 
eliminates  the  need  for  latch  address  pins  and  facilitates  convolution  and  sliding-window  operations 
with  a  single  part. 


The  bit-parallel  words  in  the  input  latches  are  converted  into  bit-serial  words  by  the  parallel- 
to-serial  registers.  The  outputs  of  the  parallel-to-serial  registers  form  the  9-bit  memory  address.  The 
memory  outputs  are  shifted  and  accumulated  to  complete  'he  sum  of  products  operation.  Tri-state 
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Figure  5.  Simplified  Block  Diagram  of  the  PIPE  LSIC  Architecture 


output  latches  are  provided  for  off-chip  buffering  All  timing  and  control  pulses  for  the  parallel- 
to-serial  registers,  the  memory,  the  shift-and-aecumulate,  and  the  output  buffers  are  generated 
on-chip  using  a  simple  shift  register  controller  with  a  load  pulse  and  master  clock  as  inputs.  The 
shift-and-ac  •unmlate  circuitry  will  operate  with  either  unsigned  magnitude  data  or  2's  complement 
data  as  selected  by  the  user.  The  design  is  very  user-oriented,  both  from  the  memory  technology 
considerations  as  well  as  the  number  of  control  lines  required  to  operate  the  chip.  Table  1  defines 
the  control  inputs  of  Figure  5. 

The  PIPE.  LSIC  design  and  layout  have  been  completed.  Figure  6  is  the  diagram  of  the  1C 
barmap  which  is  representative  of  the  actual  layout.  Figure  7  is  the  CALCOMP  plot  of  the  PIPF 
LSIC.  The  major  sections  of  this  layout  arc: 

•  Input  latches  parallcl-to-serial  shift  register 

•  FPROM 

•  Shift-aiul-accumulatc 

•  Tri-state  output  latch 

•  Controller. 

The  area  and  estimated  power  for  each  of  these  sections  is  given  in  Table  2.  The  area  for  each  func¬ 
tion  m  this  table  does  not  include  the  area  used  for  lead  routing.  The  total  bar  size  is  approximately 
240  X  270  mils-’  with  a  total  estimated  power  of  600  mW.  I  ach  of  the  functions  listed  above  is 


TABLE  1.  USER-DEFINED  CONTROLS 


Control  Line(s) 

Word  length  (3-bit  BCD  code) 
Parallel  serial 

Master  clock 

Load 

Input  strobe 
2's  complement 

Enable 

V 

VD1> 

V 

PP 

Data  valid 

Two's  complement  coefficients 


Function 

Defines  the  word  length  in  bits  of  the  input  data 

Determines  mode  of  chip  operation;  3  X  3  or 
9  X  I  operations. 

A  square-wave  clock  provided  for  system  timing 
(<20  MHz). 

Initiates  die  parallel-to-serial  data  conversion 
(<20  MHz). 


Indicates  valid  input  data  and  latches  it  in  the 
input  latches  (<20  Mllz). 

Defines  signed  oi  unsigned  magnitude  data 
operation. 

Used  to  tri-state  or  enable  the  output  bus. 

Single  +5-V  operating  supply 

Normally  at  5  V  but  taken  to  25  V  for  EPROM 
programming. 

An  output  signal  indicating  a  complete  computation. 

Used  to  set  the  sign  bits  of  die  output  word  when 
<8-bit  input  data  is  used. 
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Figure  6.  Diagram  of  the  PIPE  LSIC  Barmap 
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Figure  7.  CALCOMP  Plot  of  PIPE  LSIC 
TABLE  2.  AREA  AND  ESTIMATED  POWER  OF  THE  PIPE  LSIC 


Function 

Area 
(mil'  ) 

Power 

(niW) 

Input  latch  P  to-S  shift  register 

S  .-400 

175 

EPROM  memory 

1  2.0(10 

150 

Shift  an il-acciiniulate 

io.soo 

05 

Tri-state  output  latch 

3.000 

70 

(  onttol  anil  timing 

2.400 

110 

1 


discussed  in  detail  in  Subscc  turns  II  A  ,2  lb  tough  II  A  o.  respec  lively  \  |  mm  Me  I  el  tin  t  ol  a  li.mlvv.iu 
demonstration  of  the  sum  of  products  operatoi  usmjz  the  KO.M-uei.timiil.ite  algorithm  is  dis.  ussed 
in  Suhscelion  II. A  ' 

2.  Input  Latch  Parallel- to- Serial  Shift  Register 

This  subsection  discusses  the  input  latch  parallel-to  serial  shift  register  lunction  ol  the  PIPI 
[.SIC.  The  block  diagram  of  this  input  structure  is  shown  in  Ligure  *  The  mam  parts  ot  this 
structure  are 

•  Input  multiplexer  and  latch 

•  Parallel-to-senal  shift  register 

•  Memory  address  drivers 

•  Memory  address  select. 

A  functional  block  diagram  of  a  single  input  stage  (I  ol  '>)  is  shown  m  ligure  d.  Transistors  (Ml 
through  Ml(>)  form  the  input  multiplexer:  latches  LO  through  I.I5l  lorm  the  input  latch;  register 
Reg  1  through  Reg  14  form  the  purallel-to-serini  shift  register:  Reg  0  is  the  memory  driver  register 
with  increased  drive  capabilities:  and  Ml  7  :s  the  address  select  multiplex  transistor  used  in  pio- 
gramming  the  UPROM.  1  he  input  multiplexer  switch  transistors  <M  I  through  Mini  select  a  data 
path  for  serial  or  parallel  data  input  operation.  This  selection  is  made  possible  bv  the  parallel  or 
serial  (PAR/SLR)  select  line  Cl  IL  level)  which  is  buffered  to  an  MOS  level  by  inverter  1 1  as  shown 
in  f  igure  d.  Inverter  12  provides  the  complement  of  the  PAR  SLR  line  for  selecting  the  multiplexer 
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Figure  H  Block  Diagram  of  tile  Input  Structure 
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switch  transistors  (M  I  through  M  lb).  Once  a  data  path  has  been  selected,  the  data  proceeds  through 
latches,  LO  through  L7  anil  awaits  entry  to  latches  L8  through  LI  5.  The  data  is  latched  into  these 
registers  when  a  strobe  pulse  is  given.  This  latch  takes  place  on  the  rising  edge  of  the  strobe 
(STROBE)  pulse.  The  data  is  held  in  these  latches  regardless  of  changes  appearing  on  the  input  data 
line  since  latches  LO  through  L7  are  disabled  on  the  rising  edge  of  the  strobe  pulse.  The  strobe 
pulse  (TTL  level)  is  buffered  by  inverter  Id  and  complemented  by  inverter  14  to  provide  the  proper 
levels.  After  the  data  is  latched  (10  through  LIS),  it  is  then  ready  for  the  parallel-to-serial  The 
parallel-to-serial  conversion  takes  place  in  registers  K1  through  R14.  On  each  phase  shift  (TSi 
clock,  data  is  transferred  upward  toward  the  memory  address  driver.  Figure  10  shows  a  timing 
diagram  of  this  function. 

A  detailed  discussion  of  the  input  multiplexer  and  latch,  parallel-to-serial  shift  register,  mem¬ 
ory  address  drivers,  and  memory  address  select  circuits  follow.  The  last  topic  discussed  in  this 
subsection  is  the  level  converter  and  input  protection  circuit. 

a.  Input  Multiplexer  and  Latch 

Figure  II  shows  a  single-input  multiplexer.  Transistors  Ml  and  M2  provide  two  different 
data  paths  (A  or  B).  The  SELECT  line  is  presented  to  the  gate  of  M  1  and  its  complement  to  M2. 
A  high  logic  level  on  the  SELECT  line  will  turn  transistoi  Ml  on  and  allow  data  on  the  A  line  to 
propagate  to  point  C.  Similarly,  taking  the  SELECT  line  low  will  produce  a  logic  high  on  the  gate  of 
M2;  then,  data  on  line  B  is  allowed  to  propagate  to  point  C. 
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Figure  10  Timing  Diagram  of  the  Input,  Load,  and  Shift  Sequences 


The  input  latch  circuitry  is  imple¬ 
mented  by  using  two  N-cliannel  MOS  (NMOS) 
latch  structures.  Since  the  NMOS  latch 
structure  is  the  basic  building  block  that  is 
used  in  the  input  latch  circuitry  as  well  as  in 
other  portions  of  the  IMPL  LS1C.  a  brief 
discussion  of  this  circuit  is  given. 

The  block  diagram  and  schematic  of  an 
NMOS  latch  arc  shown  in  Figure  12.  This 
latch  is  composed  of  an  input  data  path, 
transistor  Ml,  followed  by  two  NMOS  invert¬ 
ers  (M2.  M3)  and  (M4.  M5).  and  a  feedback 
pall)  through  transistor  Mb.  These  two 
inverters  are  constructed  with  depletion  loads 
(M2.  M4)  and  enhancement  drivers  (M3.  M5). 


Ml 


Figure  1 1 .  Circuit  Diagram  of  the 
Input  Multiplexer 


The  operation  of  the  latch  circuit  is  as  follows.  Data  enters  the  latch  through  M  1  and  appears 
at  node  1  when  the  LATCH  line  is  high  (logic  1 ).  If  the  data  entering  is  high,  then  M3  will  conduct, 
pulling  node  2  low  (logic  0).  Since  the  gate  of  M5  is  connected  to  node  2.  M5  will  turn  off  transistor 
and  let  transistor  M4  pull  node  3  high,  which  is  the  same  information  at  the  input.  The  LATCH  line 
can  now  be  pulled  low.  causing  the  LATCH  line  to  go  high.  This  high  level  on  the  gate  of  Mb  will 
cause  Mb  to  conduct,  providing  a  feedback  path  from  node  3  to  node  1.  Since  Ml  is  now  in  a 
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nonconducting  state,  data  changing  on  the  input  DATA  line  will  have  no  effect  on  the  latched 
data.  If  the  data  initially  entering  the  latch  is  low.  a  low  level  will  he  latched  similarly.  A  timing  dia¬ 
gram  of  possible  inputs  and  latching  pulses  is  also  shown  in  Figure  1  2, 

The  input  latch  circuitry  incorporates  a  D-type  latch  action,  as  shown  in  Figure  13.  This 
edge-triggered  circuit  provides  greater  noise  immunity  and  shorter  data-\alid  lengths  than  a  level 
trigger  latch.  The  D-type  latch  function  is  implemented  by  using  two  N'MOS  latches,  as  discussed 
above,  clocked  on  opposite  phases  of  the  strobe  (STROBF)  pulse.  The  STROBF.  pulse  is  provided 
by  the  user  while  STROBE  is  generated  on  the  LSIC. 

Figure  14  is  a  partial  block  diagram  of  the  PIPE  input  structure.  The  input  data  can  enter  this 
structure  either  through  the  S  input  lines  or  from  a  previous  stage  determined  by  the  parallel  or 
serial  (PAR/SER)  control  line.  Data  enters  latch  DL2  when  the  strobe  line  is  pulsed.  On  each  succes¬ 
sive  strobe  pulse,  data  shifts  to  the  next  latch.  For  example,  consider  that  the  strobe  line  is  pulsed 
high  and  data  enters  from  the  S  inputs  to  latch  DL2.  On  the  next  strobe  pulse,  the  data  now'  in  DL2 
will  shift  to  DLI  and  the  new  data  will  enter  DL2.  On  the  next  strobe  pulse,  data  in  DL1  enters 
DLL),  data  in  DL2  enters  DL  I .  and  the  new  data  enters  DL2.  Once  all  of  the  input  latches  have  been 
loaded,  the  parallel-to-serial  conversion  can  be  initiated. 

h.  Parallel-  to-Serial  Shift  Registers 

A  partial  block  diagram  of  the  parallel-to-serial  shift  registers  is  shown  in  Figure  15.  Parallel 
input  data  enters  through  a  control  multiplexer  when  the  parallel  load  (PARALLEL  LOAD)  line  is 
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Figure  13.  Schematic  Diagram  of  tile  Input  D-Latch 
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Figure  14.  Partial  Block  Diagram  of  the  Input  Stage  Architecture  (3  Stages  of  9) 
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Figure  15  Partial  Block  Diagram  of  the  Parallel  to- Serial  Conversion 
(4  Bit'  of  One  X-Bit  VSord) 


pulsed,  and  enters  the  dual  register  (two  NMOS  latehes).  Onee  the  FAR  \1  III  I  (>\l)  line  hIl. in¬ 
to  a  low  logic  state,  the  phase  shift  ChS)  clock  will  shill  data  hits  uom  the  preuous  icgislcr  in  I i < 
next  register  The  phase  shift  clock  is  generated  oil-chip  by  OR-ing  the  parallel  load,  ‘h  clo.k  .md 
the  Yp  voltage  level.  I  his  operation  is  synchronous  with  the  two-phase  nonoverlappmg  ■  !o-ks  cb 
and  >1')  1  hese  clock  pulses  are  generated  oil-chip  In  mi  the  mastei  clock  supplied  h\  the  user 

c  Memory  Address  Drivers 

Hie  memory  address  driver  is  the  last  stage  in  the  parallel-to-serial  shift  register.  The  enuut 
diagram  of  the  memory  address  driv  >r  is  shown  in  Figure  l<>  I  his  circuit  receives  each  shifted  input 
data  bit  and  latches  this  data  for  one  clock  cycle.  The  latched  data  is  buffered  to  drive  the  mcmoiy 
address  decoders. 


d.  Programming  Address  Select 

I  he  Vpp  line  is  taken  to  2?  volts  during  the  programming  of  the  I  FROM  A  special  \  ,,p  volt 
age  divider  (Figure  I7)  enables  the  programming  address  select  multiplexer  so  that  the  input  data 
lines  can  be  used  as  memory  address  inputs  (Figure  Ik).  The  V  ,  voltage  divider  circuit  also  pro¬ 
vides  the  program  enable  (I’l  I  for  the  1  FROM.  This  will  make  the  memory  easy  to  program  and  no 
special  input  bit  shifting  is  required.  In  normal  operation,  the  V  line  is  kept  at  a  5-volt  level. 

e  Level  Converter  anti  ln/>nt  Protection 

The  I’ll’F  1C  operates  on  a  single  +5-volt  supply  ( V ( ,  ).  and  all  inputs  and  outputs  are  CMOS 
I  I  L -compatible.  Most  of  the  oil-chip  circuitry  requires  a  0-  to  5-volt  level  for  logic  lows  and  high-, 
respectively .  Therefore,  a  level  inverter  is  necessary  to  convert  TI  L  levels  to  the  MOS  levels  needed 
A  ITL-to-MOS  input  buffer  is  shown  in  Figure  l‘>.  It  consists  ot  two  inverters  m  series  Also,  input 
protection  from  static  damage  is  provided  on  the  input  N1 1  and  R  I .  This  level  butter  is  used  on  the 
appropriate  inputs 


3. 


I  PROM 


Scwr.il  seinieonduc  t or  ROM  tcelmob 
ogtees  Here  considered  .tv  candidates  lor 
implementing  the  metnorv  lunetion  of  the 
ROM. k. annulate  algorithm.  Prime  consider¬ 
ations  m  the  -.election  o|  the  optimum  ROM 
tcclmologv  lot  the  I’lPI  I  Sit  are: 

•  l  svr-programmable 

•  I  use  ot  reprogramming 

•  Militarv  environment  constraint' 

•  Silicic  supple  voltaire  operation 

•  Static  on-Jup  logic 

•  I  stublished  tcv  Imnlogv 

•  N  i  >nv  olatil. 


Vpp  (5  TO  25  VOLTS) 


Vp  (0  TO  5  VOLTS') 

TO  ADDRESS  SE  LECT 
I  lie  electricullv  erasable,  programmable  ROM  1 - 1  MMP  -j-.p,  FyrB  AMn 

, I  PROM,  \ -channel  MOS  t  NMOSi  tech-  “J  PROGRAM  ENABLE  (PE) 

nologv  was  judged  to  be  the  best  technol-  , —  I 

og\  to  meet  the  requirements  of  the  program.  O.  ""IL 

1  he  avbantage  o;  an  I  PROM  tcclmologv  is 

that  it  is  user-programmed  and  not  mask-  Figure  17.  V()  Voltage  Divider  C  ircuit 

programmed  Another  important  advantage  of 
the  I  PROM  is  that  the  piogr.un  can  be 

electricalK  eraser!  A  transparent  <|uart/  window  covers  the  I  PROM  package.  I  rasing  the  I  PROM  is 
a  simple  matter  ol  exposing  the  wnulow  to  ultraviolet  light  Alter  erasing,  the  I  PROM  can  be 
programmed  again 


Figure  1?  V  V ullage  Divider  C  ircuit 


I  lie  I  PROM  teelmoiogv  allows  tor  a  common  chip  til  the  mventorv  to  be  personalized  1  oi  a 
particular  algorithm  but  it  v.m  also  be  easiK  reused  latci  bn  an  entirelv  vi  i  It  ere  n  t  algoiilhm 


Figure  IS  Programming  Mnltiplovei 
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DATA  OUTPUTS  (PROGRAM  INPUTS) 


Figure  21  Block  Diagram  of  tlu-  512  -  12  EPROM  Configuration 


8  STORAGE  DEVICES 


A  BIT  PLANE  CELL 


FLOATING  GATE  GATF 


B.  CROSS-SECTION  OF  EPROM 


TRANSISTOR  DRAIN 


GATE 


FLOATING 

GATE 


SOURCE 


By  supplying  the  correct  addresses  I A„ 
through  As  I.  the  X-  and 'l  -decoders  select  the 
proper  12  memory  storage  devices  where  logic 
highs  or  logic  lows  have  been  previously 
stored.  I'hese  12  logic  levels  (bite I  are  then 
latched  into  the  memory  data  latch,  which 
I  cede  the  sit  i 1  i-and -accumulate  vu\  uilrv  . 

I  lte  X  and  Y  deende  c  iicmtiv  is  shown 
lit  I  iguic  2.'  I  lie  input  address  ami  its  data 
complement  ale  luud-vvued  to  the  decode 
driver  tiansisiors  (Ml)  thiough  \1si.  ptovidutg 
the  ncc’cssaiy  H<  1)  to  de  c  mini  ties  otic  I  he  six 
drivers  lor  the-  X  de.  ode  will  provide  21' 
dilleienl  combination  lestilliug  in  ('4 
possible  selections  \l  the  same  tune,  the1  't 
decode  pi  ovules  2'  combinations,  resulting  ill 
eii’llt  possible  selec  noils,  thus,  by  using  a  V)  bit 
address,  il  is  possible’  to  select  2‘  or  5  12 

st  t  >1  l  ge  1 1  le  a  t  It  Ills 


C  MEMORY  STORAGE  DEVICE 


<>ikc  the1  addiess  is  supplied  and  the 
piopcr  slot  nee  location  has  lice'll  selected,  it  Is 


Figure  22  EPROM  Cell  Strut  lure- 


ihn  nccessarx  to  di-li  i milk'  tlii  logic  lev cl 
iloi  d.  hi- ins:  eilhei  a  !og.k  high  nr  a  logic  low. 

I  his  link  lion  is  pci lorim-il  1  >\  the  sense 
ampin  ici  shown  in  l  isimv  24. 

\\  Ik-ii  Ilk-  |'in|Vi  scli-i  lion  lias  lieeii 
IM.ldl'  111  ilk-  decode  i.  Ill'll  1 1 IV  .  ilk*  sellsi* 
ai npl 1 1  kt  determines  tin-  logic  siati-  stored 
Ilk'ii-  I  - 1  measuring  ilk-  charge  ston-il  on  Ilk* 
ih-.iimg  mill  oi  ilk  siuiaiv  transistor  Once 

II  is  level  is  ili'livliii.  il  is  i hi lIs’ i i’ll  to  drive 
:ii,  on l pu l  nii-mmi  lakh  that  leeds  the 

nit  a  i  id-. -i  -.11111111111'  CItVUIll  V 

Hi- 1 1 ii\-  programming.  t hi-  memorv  is 
mas,  u  pi  exposing  ilk-  ship  through  the 
iranspaii  ill  w  indow  lo  high-densil;,  ulira- 
iioh't  I i”l i t  i  w  avelength  2.537  angstroms). 
I  Ik-  recommended  luinunum  exposure  dose  is 
I  5  watt-seconds  per  square  centimeter.  Alter 
eiasuie  (ail  Inis  an  in  logic  high  stale),  logic 
iows  are  programmed  mlo  the  desired  loca 
lions,  \  low  level  can  onlv  he  erased  h\ 
ultraviolet  light.  I  he  programming  mode  is 
achieved  when  \  is  taken  to  25  volts.  Data 
is  presented  in  parallel  I  2-  hit  words  on  O, 
through  (,)|;  oi  1  igure  21  I  Ik- corresponding 
low  hits  are  programm  'd  into  Ilk-  memorv  hy 
taking  Ihc  Imes  low  lor  50  ms, 

4  Shilt-and-Aii  annulate 

I  icine  2"i  is  a  block  diagram  ol  the 
>ii.!i  iind-accimnilate  cuvuittv  which  con- 
'li'icl'  a  snl,i-ol-pioducts  Iroili  a  hil-hv-bit 
parallel  s.aial  arithmetic  operation. 

I  here  are  !2  data  inputs  that  come 
!:oi'i  the  I  I’ ROM  section.  S  control  lines,  and 
7o  uiuput  lines  Since  the  ROM -accumulate 
-  ,  imiiiue  produces  no  round-oil  error,  all  20 
Oils  are  saved  and  presented  as  outputs 

[•'igure  .2  f  i  shows  the  operational 
a  i  1 1 1  c  1 1 1 1  e  ol  the  shiil-and-aicmmilate 

Cil  i  II  it  I  V 

I  welve  hit  memorv  data  is  piesenled  lo 
:  la  I  nil  adder  "A  "  in  puls  on  the  l  ism  g  edge  ol 
0  ■  phase  cloi  k.  '!>.  and  lat,  lied  on  tile  tailing 


A.  X-DECODfT 
(  I  OK  04  I 


B.  Y-DECODE 
(l  OF  8) 


l  igiirt-  23  Scliematie  Diagram  of  the  X  and  Y 
Matrix  Decoders 


ind  Buffer  Cirruitn 


TCC  MEMORY  DATA 


OUTPUT  DATA 

Figure  25.  Block  Diagram  of  the  Shift-and- Accumulate  Section 


edge  of  <h.  The  suin-und-carry  results  are  latched  into  a  master-slave  register  on  the  next  rising  edge 
of  <I>.  By  feeding  the  sum-and-carry  latch  outputs  forward  (left)  to  remaining  full  adder  inputs,  a 
binary  multiplication  is  performed  per  each  clock  cycle. 

The  sum-and-carry  latches  are  cleared  on  the  rising  edge  of  the  clear  accumulator  (CLFAR 
ACT)  pulse  which  initiates  the  shift-and-aceumulate  operation  fora  given  input  data  block.  Sign 
magnitude  or  2's  complement  memory  data  can  be  used  by  activating  the  true  (TRIM)  pulse  or 
complement  (COMP)  pulse  full  adder  control  lines,  respectively.  Individual  sums  and  carries  are 
latched  into  the  output  adder  on  the  falling  edge  of  the  latch  accumulate  (LATCH  ACC)  pulse,  and 
the  final  result  is  valid  on  the  next  rising  edge  of  the  LATCH  ACC  pulse. 

The  total  computation  requires  2B  clock  cycles  per  input  data  block,  where  B  is  equal  to  the 
number  of  bits/input  word. 

The  logic  implementation  and  truth  table  for  the  PIPL  LSIC  full  adder  function  are  shown  in 
Figure  27.  Note  that  the  outputs  of  this  circuit  are  complemented  sum  (QS)  and  carry  (QC>.  allow¬ 
ing  a  minimum  number  of  components.  The  full  adder  (FliLAD)  schematic  diagram  is  shown  in 
Figure  28.  The  addend  input  includes  a  latch  from  which  the  true  or  complemented  data  may  be 
selected.  Transistors  Ml.  M2  through  M5.  Mb.  and  M7  and  MS  are  a  pass  gate,  a  two-stage  inverter, 
a  feedback  gate,  and  select  gates,  respectively.  Transistor  M  14  is  the  load  device  for  the  earn  NOR. 
with  the  series  string  of  transistors.  Mb  and  MIO.  ANDing  the  addend  and  augend  inputs,  the 
paralleled  devices.  MI2  through  Ml  3.  ORing  the  addend  and  augend,  and  the  series  device.  Mil. 
ANDing  the  OR  result  with  the  previous  stage  carry.  Comprising  the  sum  NOR  are  the  following: 
the  load  device,  M22.  the  paralleled  transistors.  MI5  through  M17.  that  OR  the  three  inputs:  the 
series  transistor.  Ml  8.  that  ANDs  the  OR  result  with  the  CARRY  output;  and  the  series  string  of 
transistors.  Ml1*  through  M2I.  that  ANDs  the  three  inputs.  This  circuit  requires  only  true  level 
inputs  and  a  total  of  lb  devices,  excluding  the  latch. 

A  TCC  (2‘x  complement  coefficient)  control  line  is  provided  by  the  user.  When  operating 
with  signed  magnitude  input  data  less  than  X  bits  in  word  length,  the  TCC  line  can  be  taken  to  a 
digital  low  to  ensure  that  the  upper  sign  bits  are  set  correctly. 


COMP  ^ 

TRUE 

1  B)  C  QS  '  (A  '  B  1  C)  QC  *  ABC 


ignre  2N.  Schematic  Diagram  of  the  f  ull  Adder  With  Input  Data  Latch 
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LSUM 


LCAR 


Figure  29.  Schematic  Diagram  of  the  (  arrv  Sum  Latch 


I'lie  sum-aml-carry  latch  shown  in  figure  2b  is  an  inverting  dual  master-slave  configuration 
which  accepts  data  on  the  tailing  edge  old’.  presents  complemented  data  at  the  output  on  the  next 
rising  edge  ol  <!>.  and  latches  the  output  on  the  subsequent  falling  edge  ot  <I>.  I’ass  transistors  \l  I  and 
VI  2  and  I  cedi'...  Is  transistors  M  2d  and  M24  are  enabled  In  «!’.  while  )'ass  dev  ices  M  I  '  and  \1  I  4  and 
feedback  devices  Mb  and  M10  are  enabled  by  ‘l>.  Transistors  M.T  through  MS  and  Ml  I  and  M  I  2 
implement  the  dual  master  two-stage  inverters,  ami  transistors  MIS  and  Ml(>.  Mlb  through  M2  2. 
and  M25  and  M2n  implement  the  dual  slave  two-stage  inverters.  Devices  M  I  and  M  IS  aie  paralleled 
with  M  1  5  ami  M  I  <>.  respectively,  in  the  dual  first-stage  slave  inverters  to  prov  ide  a  clear  capability 

The  output  adder  circuitry  uses  the  charge  storage  capabilities  ol  NMOS  devices  to  implement 
a  compact,  high-speed,  look-ahead  carry  function,  f  igure  dU  shows  a  4-bit  2's  complement  adilci. 
I  he  exclusive  OK  (XORl  circuitry  is  static  NMOS  logic 

T  he  NOR  gates  shown  are  a  subset  of  the  XOR  gates  and.  therefore,  require  no  extra  devices 
to  realize  a  carry  decode,  file  fast  carry  is  done  with  dynamic  ratioless  circuitry  <M5  through  M  1  5  ). 
where  capacitive  nodes  n„  through  nn  can  temporarily  store  clocked  signals  A  I  ATOM  Af  t 
pulse  precharges  nodes  n„  through  n,  to  a  logic  high  voltage  through  transistors  M 5  through  \fS 
When  I.ATCII  ACC  pulse  goes  low.  nodes  n„  through  n,  are  selectively  discharged,  based  on  the 
output  voltages  of  the  input  NOR  and  XOR  gates  existing  on  nodes  n.,  through  n , ,  .  \V  hen  nodes  n„ 
through  n,  have  settled,  the  complemented  sum.  S.  of  A  and  If  appears  at  the  outputs  ot  the  Imal 
XOR  gates.  I  he  dynamic  ratioless  carry  allows  the  adder  to  operate  at  high  speeds  with  a  minimum 
number  ol  transistors  and  low  power  dissipation. 
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TCD  ’  COMP  TRUE 


TCD  -  TWO'S  COMP  LEMENT  DATA  1 _ _ _ J  OUTPUT  BUR  FE  R 


MI  X;  ami  will  produce  .1  1  Alt  II  Ol  Il'l  I 
via  NOR  nati-  \  on  the  same  *1*  eloek  Idle 
output  ol  NOR  gale  \  is  also  buffered 
through  butler  BIN  to  provide  an  external 
DA  1  A  \  AL1D  pulse  A  tuning  diagram  ol  tins 
function  is  shown  in  I  igure  3T  Idle  lateli 
aeeuimilator  ami  lateli  output  waveforms  are 
shown  for  !-  through  X-bit  input  words. 

The  eontrol  multiplexer  is  shown 
schematically  in  Figure  3X.  I  Ins  multiplexer 
consists  of  three  input  stages  (TIL.  level) 
which  prtwide  the  necessary  control  logic  to 
drive  the  decode  circuitry.  The  decode 
circuitry  selects  the  proper  data  path  by 
turning  on  the  associated  transistor  (Nidi). 
M25.  M3U.  M35.  M4U.  M45.  M5(J.  and  M55». 

A  block  diagram  of  a  l)-controller  latch 
is  shown  in  Figure  3d.  Plus  latch  is  imple¬ 
mented  In  using  two  N’MOS  latches  in 
cascade,  clocked  on  opposite  phases  (Figure 
3c)Bi  which  produces  a  l)-t\  pe  lateli  function. 


A)  BLOCK  DIAGRAM 


B)  CIRCUIT  SCHEMATIC 
Figure  36.  Controller  Set/Resel  Latch  Diagram 


A  circuit  schematic  of  the  clock  buffer 
BF1  is  shown  in  Figure  40.  This  buffer 

receives  the  master  clock  (TI  L  level)  olf-chip  and  derives  the  two  phases  0  to  5  volts  necessary  to 
drive  the  011-chip  circuitrv.  Input  protection  is  provided  In  N1 14  and  Rl. 
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Figure  37.  Controller  Timing  Diagram 
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(A)  LOGIC  DIAGRAM 
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(B)  CIRCUIT  SCHEMATIC 
Figure  41.  NOR  Gate  Bus  Driver  Diagram 


1  he  NOR  gates.  N,  .  N; .  N,.  and  N\ .  are  required  to  drive  internal  buses  and.  therefore,  not 
only  provide  an  NOR  function  hut  also  provide  required  buffering.  A  logic  operation  and  circuit 
schematic  is  show n  in  Figure  4  1 

7,  Programmable  Sum-of-Products  Operator 

lo  fullv  explore  the  architectural  implications  of  a  programmable  siiin-ol-producls  operator, 
a  hardware  implementation  of  Initiation  1  has  been  designed  and  fabricated  under  a  parallel 
contract  with  (  arnegie-Mellon  University.  The  breadboard  can  operate  on  either  sliding  or  fixed 
blocks  of  data.  A  block  diagram  of  the  hardware  implementation  is  shown  in  f  igure  42.  The  bread¬ 
board  consists  of  nine  input  latches,  nine  parallel-in  serial-out  shift  registers,  a  fast  512  X  12-hit 
memory  for  temporary  storage  of  the  partial  products,  an  1  PROM  for  permanent  storage  of  the 
partial  products,  shilf-and-aceunnilate  circuitry,  tri-state  output  latches,  and  control  circuitry 

The  input  latch  structure  is  hardware-  or  software-selectable  for  either  serial  data  entry  at 
DA  I A  INPl'l  P  or  parallel  data  entry  at  DATA  INPUTS.  DATA  INPUT  T.  and  DMA  INTI  I  P 
I  Ins  facilitates  the  implementation  of  a  nine-point  transversal  filter  or  a  5  by  5  sliding  window 
operator,  respectively.  The  input  data  word  length  is  hardware-selectable  from  1  hit  to  X  hits  and  is 
hardware-  or  software-selectable  as  2"s  complement  or  sign  magnitude  format. 

The  weighting  coefficients.  W,.  determine  a  set  of  partial  produets  which  are  stored  in  a  512 
/  12-bit  high-speed  r.tdom  access  memory  (RAMi.  Partial  products  may  he  down-loaded  on  the 
data  bus  from  an  external  source  by  a  controlling  processor  The  partial  products  are  hardware- 
selectable  as  2's  complement  or  sign  magnitude  formal.  The  partial  products  obtained  from  the 
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RAM  during  operation  arc  summed  in  a  carry-saw  accumulator  with  data  shilling  to  weight  the 
significance  at  each  puitul  (xoiitu  I  Output  data  I  20  hits)  is  buffered  by  a  in  state  output  latch 

file  data  input  section  consists  of  nine  I. itches  connected  to  form  a  R-slage-long-hy  -K-hit-w ide 
shift  register.  I  lie  outputs  ot  each  stage  are  also  connected  to  nine  parallel-to-senal  conversion 
registers  which  form  the  data  for  ROM -accumulate  operation.  The  data  input  operation  is  con¬ 
trolled  In  the  INPL  I  ('ONTROl  i  1  R  The  controller  has  a  hardware-  or  software-selected  BCD 
value  count  .is  one  of  its  inputs  The  INPL!  RLSLT  line  is  pulsed  low  to  imtiali/e  the  controller. 
The  INPL  T  S  I  ROBI  line  shifts  data  through  the  input  late  lies  and  clocks  the  controller  on  its 
leading  edge.  After  enough  strobe  pulses,  the  INPUT  ( 'ONTROl. 1. 1 ; R  generates  a  LOAD  pulse 
w  hich  latches  the  data  from  the  input  latches  into  the  parallel-to-serial  registers  and  starts  the  high¬ 
speed  asynchronous  COM  ROLL  1  R  I  he  LOAD  pulse  also  resets  the  INPUT  CON  I  ROLL  f  R  so 
that  new  data  can  be  shitted  into  the  input  latches  while  the  ROM-accuinulate  operation  is  taking 
place.  A  second  INPUT  RLSLT  pulse  is  not  required. 

Idle  ROM-accumulate  CON  LROLLLR  operates  from  an  as>  nchionous  internal  I  <>.  7-M 1 1  / 
oscillator  (OO-tis  period).  Ill  is  is  the  maximum  clock  rate  for  the  components  selected  for  the  shut 
registers  and  accumulator.  Alter  a  LOAD  pulse  is  received  I  rum  the  INPL  I  CONIROll.l  R.  the 
(  ONPROI  U  R  sequences  the  operations  of  the  PARAl.l.l  L-to-SLRIAL  RKilSTI  RS.  the  PAR 
1  IAI  PROD!  (  I  MLMORV.  the  SI  II I  I -AN  D- \(  (  I'M  LI  A  1 1  .  tunction.  and  the  Oi  l  PI  I  I  A  1(11 
It  also  provides  signals  which  can  be  monitored  by  an  external  processor  il  desired.  After  the  final 
INI’L  I  S  1  ROBI.  pulse,  the  result  ol  the  ROM  accumulate  cak  ula lion  is  available  at  the  I  R  l-S  I  A  I  I 
OLII’l  I  I  \ICII  m  I  Bv  4  (vj|  internal  clock  c\ cles.  flic  additional  o' .- clock  cycles  o  due  to  the 
pipelined  architecture.  I  Inis. 

'l<OM,, cuniul.ee  -  M  H,  +  3«H)|  ns 

where  ti<()M  ,  .  ,lm ...  is  the  time  required  to  process  the  data  and  If,  is  the  m.mrci  ol  sicnil n .nit 
bits  m  each  input  data  word  for  ,S  bit  data  this  o  S~U  us  I  he  R(  )\La  ■.  uur.d.ilc  (  ON  I  R(  'I  I  I  R 
will  bring  the  Rl  AID  line  high  when  the  < >u  1  pu t  is  available  and  will  not  allow  .motlici  les.dt  to 
ovei  vv  i ate  the  output  latch  data  until  t lie  R I  AD  \<  K  mpiil  is  piiilsO  Iti-.li  D  .  1 1  •  m ol'i;  e  v  - 
sol .  I  lie  Ol  1  PI  I  Bl  I  I  I  R  1  l  LI  Imc  goes  high,  the  I Nl’l  '  I  S  I  ROltl  line  ,s  .nlid'i t ,  d  '  .1  : 1 ., 
internal  oscillatoi  is  stopped  il  an  overwrite  condition  exists  lhclii-t.il.  o:.tp..!v  ,u  ::.b  a.  I  . 
pulling  the  cx  tenia  I  OL  I  PL  I  I  N  A  Bl  I  line  low 

Owing  to  the  pipeline  oigam/atioii  ol  the  ROM.k  ctimiil.itoi  h.udwii,  in  l'\RM  1  I  I 
SI  RI.Al  R  I  (  1 1 S  |  IRS  call  be  loaded  as  soon  as  If,  lots  ol  data  have  been  smite  l  -  ,,!  -i  ; ;  .  :  :  , 

Bx  internal  clock  cycles  ol  oil  •  if  nanoseconds  al  (ci  the  l.ni  I N I  ’  I  I  S  I  ROBI  (nils,  la  s  i-.i 

data,  this  is  4S0  ns.  which  gives  a  throughput  rate  ol  about  2  Mil/  li  the  INIM  I  (  i  >\  I  R< ) I  I  I  K 

generates  a  LOAD  puLc  he  I  ore  Ihe  I’AKAI  I  I  1  SI  Ri  Al  Rl  US  I  I  RS  in  emptied  II. s  I Nl’l  I 
If  l  I  I  IR  I  l  II  line  will  go  high  a  ml  the  INI’L  1  S  1  l<(  >  HI  line  will  he  mini-  t  ,o  iml>i  On  i  a,  -i .  i  s 
are  emptied  to  pievcut  oveiwiilmg  any  data 

from  the  above  discussion,  it  can  be  seen  that  the  maximum  input  data  m.  d.  p.  mi-  on  p  t 

number  o|  bits  m  the  input  data  words  (B  i  and  tin  numb  i  ,p  I \ I ’ l  I  x  |  R(  i|t|  i  -  j  I., 

mimbei  ot  INPL  1  S  1  ROBI  puKes  (N  i  between  put. did  !<•  wi  i  i!  omc-mm.!  i ■■  p,  i  .p, 

operations  pel  loimcd  I  oi  sliding  a  •  '  oi  '>  ■  I  tiltei  appli  .it %  ■  ■  .p.a.  . I 

between  parallel -to-serial  conversions  I  hi  uonshdinp  '  '  uniL'W  . i  ■  .  p  ,  .pa. 


1  t 


arc  needed  a  ml.  !<n  a  *'  ■  |  transform.  nine  sirolv  pulses  ale  needed  I  lie  maximum  input  data  laic 
is  given  In 

I M  \  \  l\  =  (  N  m  •  mi  ns  i 

I  .>r  sliding  window  or  transversal  filter  applications  with  X-bit  data,  i  \(  N  \  -  'Mil/  l.napph.a 

lion  Midi  as  .in  x  '  I  1 1  an  storm  \m  tli  S- hit  dal.:.  I\i  a  \  |\  =  I  <>  Ml  1/ 

I  In  maximum  output  data  rate  is  always  given  In 

•  m  w ol  |  -  i  <  b,  /  (,o  ns) 

I  he  breadboard  also  can  operate  in  p.uallel  with  two  other  breadboard'  to  provide  throneh 
pul  at  leal  time  i  l  \  i  data  tales,  liiree  control  lines,  shown  dashed  in  1  igntv  42.  are  piovidcd  to 
s\  nehroni/e  this  operation  I  liese  lines.  TV  KESE  I  OT'I  IT  I  .  TV  S  I  ROW  .  and  (II  \K  Ol  I IM  I 
I  V\HI  1  .  are  normally  >t  eonneeted  when  operating  a  single  Ineadboard 

I  he  prograiniuable-sum-or-produets  breadboard  is  1  2  a  12  >'  5  inelies.  weighs  "  pounds,  and 
dissipate-  15  vv  in  I  ig, nc  45  is  a  photograph  ol  this  breadboard,  l  or  si/e  comparison,  a  40-pm 
ihial-in-hne  package  is  shown  adjacent  to  the  electronics  boards.  With  the  exception  of  the  input 
controller,  output  controller,  and  the  EPROMs  for  program  downloading,  the  entire  electronic 
boards  have  been  integrated  onto  the  PI  PI  I.SK  I  Ins  represents  nionohtliicallv  integrating 
approximately  "5  discrete  integrated  circuits 

B.  PIPE  LSIC  APPLICATIONS 

I  ho  subsection  is  a  discussion  of  applications  of  the  PIPE  LSIC  described  in  the  preceding 
sec  lion.  I  he  ty  pcs  ol  operations  as  well  as  operational  constraints  are  discussed. 

I  Introduction 

Main  digital  signal-processing  anil  image-processing  algorithms  require  operations  ol  the  lorm 
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where  the  \V,  represents  a  set  ot  lixcd  weighting  coefficients  and  the  X,  represents  a  set  or  sequence 
ol  input  values.  Equation  5  can  be  used  to  calculate  the  coefficients  ol  various  transforms  such  as 
Eourier  Cosine,  lladamard.  Ila.tr.  etc.  Where  two-dimensional  transforms  are  needed,  sueeessive 
one-dimensional  Hailstorms  can  he  used  tl  the  transforms  are  separable,  f  or  image-proeessing  appli¬ 
cations.  Equation  5  delines  the  discrete  convolution  ol  a  two-dimensional  input  image  with  a  con¬ 
volution  .may  .  1  hose  mathematical  operations  are  based  on  the  adjacent  pixel  values  and  are  termed 
neighborhood  operators  I  samples  of  neighborhood  operators  include  noise  smoothing,  edge 
c rispenine.  linear  edge  enli.in.  ement.  etc 

I  he  lo||owmg  subsections  discuss  the  applicability  of  the  PIPE.  I  SIC  to  matrix  operations  for 
i  ah  u  la  I  me  H  aits  lorm  cod  I  icients  and  neighborhood  operator  ealett  la  t  toils. 


Input  Output  (  onsideiations 


k.  s  to  tilt.'  lull  Use  til  the  I’ll’l  I  Sit  is  .in  ..inlet  stand . :  ■  •  I  Ilk-  input  output  k  I. it  !•  utslup' 

I  leute  44  shows  tlk'  input  output  pins  ol  tin  1*1 1  ’I  ]  S|t  I  In-  I’Ak  SI  K  silt  I  pm  detet 
.i. iik' '  ss  hether  tlk'  l  SIC  operates  on  l>  >  1  or  4  ■  hint  hs  •  >|  4.,t.i.  either  ol  ss  likii  mas  ho  sliding 
>>i  nonslktm;j.  Ilk'  input  word  length  is  design. ned  1-s  the  4  hit  WORI)  I  I  Ntilll  pms  I  whl-hit 
patallel  input  words  S.  1.  or  I’  arc  loaded  into  tin-  input  Lit.  lies  In  tin.'  INI’l  I  S  I  R(  Mil  and 
.onuTU'd  into  hit-serial  words  hs  the  MAS  I  I  K  (  I  t  ><  k  and  I  041)  I  lie  I’ll’l  I  Sit  is  t  apahle  oi 
'pei.itiin'  tut  2"s  eomplement  or  sign  magnitude  data  as  deiennined  In  the  state  ol  t lie  2  s(  ()MI’ 
pm  \  I ) A  i  A  V  AL. 1 1 )  pulse  mloi  ills  t  he  tiset  when  l  Ik  1*11*1  I  Sit  Ini'  sonipleled  a  s  ik  illation.  I  lie 
St-bil  parallel  output  is  oblaineil  In  bringing  tlu  .liable  il\i  pin  loss  to  .utisale  the  tli-state 

t  'Ulpilts. 

I  lie  I’ll’l  l.SK  reipures  a  single  S-solt  posset  suppi;.  and.  dining  programming  ot  the 
I  l’R()\l.  a  2s-solt  programming  soltage 

4  total  ol  5k  pins  is  leqimeil  lot  lull  S-hit  input  and  2(1  -hit  output  operations.  II  tile  input 
ssonl  length  is  reslined  to  t.  hits  and  mils  4  hits  <>i  the  output  ate  used,  a  total  ol  40  pins  are 
needed.  I  he  pins  required  lot  ssord-length  selection,  input  is  pe  i  parallel  or  serial!,  and  data  format 
I  2  s  eoniplement  or  magmiiklei  ean  he  eluninated  hs  on-elup  bonding  to  the  V , (| (  or  around  pin 
lor  further  pm  redtntion  on  fixed  applisation. 


VQD  VPP  GNO  EN 


MASTER 

CLOCK 


Figure  44  I’ll’l  l.SK  I  O  Considerations 


I  Ml  upi-UlHWUl  JmUv!lT1sIUn  ol  the  IMI’I  I  SK  .ill'  J  IS.  U  sSOll  III  .1  1.1  tc  I  Ml  I'SOl  t  |0  |  j  I  Ill'S. 
.  Ii.ii.k  m 1  isti.s  insult  in  p. ill  l  r<  >m  tlm  amhlKvttim  solm  tml  to  ltnpknmnt  tlm  I’ll’l  l.SK  and  m  putt 
1 1 1 mu  Hu-  .  iii  lilt  ik-Mjin  1 1 1  1 1 1 e  .iii. liiu-,.  i iiu-  \tiri  a  iIim  ussii m  oi  the  t\  pos  ot  >  ak  illations  tin-  i’l  I'l 
I  Ml  Is  i  apal'k  lit.  t  lli'si'  i  li.it  a.  ter  1st  n  s  at  i  i'.isi  l\  mule  I  sti  mil  aild  apprm  I. it  ail 

I  Matrix  Operation 


lie  I’ll’l  I  SK  is  iik-.ilh  suited  lor  matrix  operations  ol  tlm  torm 

't  - 1 u ,  u,  w.  u,  u,  \v„  tt-uj  r \; 


La-.j 


sail  "l  represents  till-  prodilit  nt  a  low  leitoi.  W  .  anil  a  lulutilll  'color.  ,\  Most  sit’lial- 
pin  utMMis  ropnre  the  proiluil  ol  a  \x s  ielit ine  matrix  ami  ;m  input  vivlor  I  Ins  sail  be  r 


I'l  l  u  l  'SlUp 

.■presents- ".I 


'ol  till'  I'll'l  I  Sl<  as 


\  '  vv  \ 


W*.  VV.,;  VV 

u  ,,  u  ,  w 


u . ,  u  .  \\  . 


\  iliaL'i.nn  ..|  th  I’ll’l  I  SK  lunluuiii  .I  to  nnpk  iiii-nt  I  .p.aiioti  t-  shown  in  I  mum  4- 
\iia  I’ll’l  I  SK  S  are  tiseil .  with  e.nli  I’ll’l  pn  mrainmeil  w  1 1  h  t  lu  w  eu.'li  t  mi’  i  on  1 1  u  lent  s  ol  one  i  ow 
'I  'A  I  Ins  eliminates  mproeraninunL’  the  I’ll’l  I  SK  thus  in  iv.isii  I !  ■ ,  speed  o|  tin  ink  illation 
\ li  tin  I’  inputs  o|  tlm  I’ll’l  1  SK  s  am  i onnm  toil  ami  ilulu  is  nut i  i ml  h  | m  'it lal l\  Sun  sninpn 
p.  i  unis  am  m  a  pi  irnil  to  lo.nl  tlm  I’ll’l  1  SK  s  11m  \  '  e.  toi  is  .  ak  i  iati  si  in  p.u all.  i  lot  tlm  pa  1 1  n  ula  i 
i. ilnns  ol  \  \  \alnl  output  is  available  even  samplo  p  -i  uni  1 1  opni  ai  n.a  on  slnlnm  ,J  K  I  Jala  >vlulc 

niiin  s.mtpln  pnriojs  am  rmpnmil  to  proilimn  an  auxin  i  it  tin-  opniation  ol  I  umilmii  is  peilomi.d 
ui  in > i i s 1 1 1 1  n l e  lM'\  I  data  No.  hs 


! 


8G  ' 


.  A 


88 


'70 


'60  • 


.  A 


50 


'58 


40 


.  A 


48 


'30  ■  • 


'38 


'20 


'28 


‘10 


I  8 


A, 


A, 


'00  '  •  ”08 

PROGRAMMABLE 

IMAGE 

>  UOCESSING 
E  LEMENT 


'78 


68 


-►  Y. 


-►  Y_ 


"►  Y, 


-*■  Yr 


"►  Y  . 


Y, 

Y 


-►  Y 


I  iiimv  45  I’rour.imin.ilili'  liniiae  l*roiv\sin_a  Element  Configured  (o  Impiemcul  Y-\\Y 


Ilk’  ^  i  >n  I  ii-iiiki!  n  >n  '.iinwn  in  I  il'iiiv  4>>  win  lv  used  lu  enleukiU'  aii'll  k  iciil'  t.'l  minis 

I  I  .1  !ls|  t  M  l  i  Is 


ii  I )uf-l)inn  n\iuii  I rtinsiunns 

Ik  dm  1,1 1  ,  i  >sl  111  1 1  ,ili'0  >1  111  ( >  I  .i  ii,  i  l.i  so  |  ik  Ik  i  is  ill  |  iileil  .is 


M  I 


'l 


(  2|  1  I  ikn 

\  ins  Is  - 

1  :m 


M  I 


\ssiiiinik'  ,ni  8  |  ‘i  m  it  l  li.iiislmm  is  ilisii’i'il.  I  i  j  mi  I  ii  >n  N  >iniplilii‘s  tc 


HAAR 


figure  4(>  Weighting  Coefficient  Matrices  for  Various  S  *  II  ransfonns 


5 


1. 


(  2|  +  I  I k 7T 

X  V.<>S  k  = 

1  I  (. 


/ 


i  •[ .  Hi  ni.iln  \  : i > l  i : l 


\ 

■"o  i"  o  i?-'  . 

0.17'"' 

Xu’ 

>  1 

0.245  0.20s 

0.245 

x, 

\ : 

0.231  . 

0.251 

X; 

>  4 

0.20s 

0.20s 

x, 

>4 

o  r-  . 

0  17' 

Xu 

\  ; 

0.150 

0.150 

X, 

\ 

0.000 

O.oon 

X„ 

\  - 

0.040  .  . 

0.040 

X. 

I  Ik'  discrete  cosine  transform  c;m  ho  implemented  usint:  I  itmre  45 


I  ho  wok'li  I  inti  coefficient  nut  t  rices  for  x-ln-1  lladamard  Walsh.  and  llaar  transforms  aro 
shown  m  I  it; u iv  41 .  I  ho  transforms  Uosorihoil  ahovo  all  hate  roal  coefficients. 


I  ho  I  onrior  transforms  produoos  complex  coefficients  hocauso  of  a  complex  woitihtinti 
matrix.  I  ho  discrete  Fourier  transform  of  a  sequence  is  dolinod  In 
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I  ipiation  I  I  can  ho  rewritten  as 


on 


(III 


(  I  2  i 


(lot 


which  is  the  same  form  as  liquation  5.  I  ho  woitihtini;  ooolfioionts  aro  now  complex,  requiring  addi¬ 
tional  1*11*1  I  SK  x  to  calculate  the  roal  ami  imatiinan  Fourier  coefficients.  I  h is  is  shown  in 
I  itiurc  4  7 


SO 


■  jr*  f 


H(Z)  -  Y(Z)  =  A0  f  A  j  Z"1  *-  A 2  Z~2 
X(Z)  I  -  B,  Z-'  -  B2  Z-2 
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Figure  49.  Using  PIPE  LSIC  to  Implement  Second -Order  Filter  Function 

I  lie  difference  equation  is 

Yk  =  au  \  +  ai  \  i  +  a:  xk  :  +  hi  vk  i  +  h:  Vk  : 


Y,  =  | a  a,  a,  h,  b,  J 


I  (|tmtion  17  is  the  same  form  as  Equation  h  and  can  he  implemented  using  HUM  l. SIC's  as 
shown  in  Figure  4‘)  I  he  X  data  sequence  is  loaded  sequential!;,  into  the  First  1*11*1  ■  LSIC  containing 
weighting  coefficients  a(J.  a, .  and  a;.  Flic  output  of  the  first  I’ll'l  1  SIC  is  loaded  sequentially  into 
the  second  IMI’I  L  SIC  to  complete  the  calculation  of  equation  17. 

4  Neighborhood  Operators 

Many  image-processing  algorithms  such  as  noise  cleaning,  edge  detection,  and  edge  enhance¬ 
ment  can  he  implemented  with  the  I’ll’!  I  SIC  operating  on  sliding  3  <  3  pixel  blocks  of  the  input 
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Figure  SO  Two-Dimensional  Spatial  Convolution  for  3  X  3  Neighborhood 


image.  A  3  X  3-input  array  is  spatially  convolved  with  a  two-dimensional  weighting  arras .  as  shown 
in  Figure  50.  1  his  3  X  3  two-dimensional  spatial  convolution  is  a  very  powerful  image-processing 
calculation  and  one  for  which  the  PIPI  LSIC  is  ideally  suited. 

I  lie  following  subsections  discuss  some  commonly  used  neighborhood  operators  and  present 
some  experimental  results  of  the  programmable  sum  of  products  breadboard  described  in  Subsection 

II. A." 

a.  Xoise  (  lean  inn 

Main  images  contain  discrete  pixel  variations  that  are  a  result  of  noisy  sensors  and  very 
objectional  from  a  user  viewpoint.  Simple  low-pass  spatial  filtering  can  eliminate  or  smooth  most 
such  noise  since  the  noise  is  decorrelated  spatially  from  its  surrounding  pixels,  figure  51  shows 
these  low-pass  weighting  arrays  that  can  be  used  as  weighting  coefficients  for  the  PIPI.  LSIC.  \s 
can  he  seen,  the  weighting  arrays  are  normalized  to  unit  weighting  to  prevent  an  intensity  bias 
into  the  processed  image 

h  lulf>e  Enhancement 

I  dge  ei  iiancement  is  used  to  accent  edges  ol  an  image  to  provide  a  more  subjectively  pleasing 
image  Since  areas  of  high  frequency  (edges)  are  to  be  highlighted,  the  logical  operation  is  a  high- 
pass  tilter  I  lus  can  be  done  spatially,  using  the  PIPI  LSIC  as  a  3  X  3  neighborhood  operator.  Some 
weighting  arrays  that  are  of  the  high-pass  form  are  shown  in  f  igure  52.  There  is  no  need  to  nor¬ 
malize  t  liese  arrays  since  their  ■dements  sum  to  unity 

c  /  live  Detectors 

One  ot  the  most  distinguishing  features  of  an  image  are  edges  because  they  provide  informa¬ 
tion  on  the  physical  extent  of  objects  within  the  image  l.dges  are  defined  as  local  discontinuities 
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Figure  52.  High-Pass  Weighting  Arrays 


ih  tin-  imago  huninanee  or  amplitude  level.  There  are  basieally  two  methods  of  edge  detection  edge 
likiiiiement  followed  b\  thresholding  and  edge  filtering.1 

In  the  edge  enhancement  thresholding  method,  the  input  image  is  spatially  convolved  with  a 
•i  ol  linear  weighting  arrays  to  produce  a  set  of  gradient  functions  winch  are.  in  turn,  combined 
L\  a  linear  or  nonlinear  function  to  create  an  edge-enhanced  array,  lo  improve  edge  visibility,  the 
-t.i-.  level  map  is  compared  to  a  threshold.  I  :  if  the  gray  level  is  greater  than  I  .  an  edge  is  assumed 
p*  ."'on i  it  the  gray  level  is  less  than  I.  the  decision  is  no  edge.  I  he  selection  of  the  threshold  is 
v  r.  important,  it  it  is  too  high,  some  edges  will  not  be  detected:  if  it  is  too  low  noise  will  be 
ililc-  led  as  edges. 

I  here  are  two  types  of  edge-enhancement  operators  differential  edge  detectors.  sikIi  as 
Huberts.  Prewitt,  and  Sobel.  and  template-matching  edge-detection  such  as  compass  gradient. 
Kirs,  It.  three-level,  and  five-level. 


In  edge  lifting  edge-detectors,  subregions  of  the  input  image  are  fitted  to  a  two-dimensional 
model  nt  an  edge.  If  the  tit  is  close,  an  edge  is  assumed  to  exist  with  the  same  parameters  .is  the 
.■dev  model  I  dge  lifting  cannot  be  implemented  with  the  I’ll’l  I  S I (  and  is  not  discussed  turthei 
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Figure  53.  Weighting  Arrays  for  Various  Differential  Ldge  Detectors 


it  Differential  i'.dfte  Detectors 

Ihc  Roberts  edge  detector  is  applied  to  a  2  X  2  neighborhood  of  pixels  and  can  he  imple¬ 
mented  as  two  spatial  convolutions  of  the  input  array  with  a  weighting  array.  I  he  outputs  of  the 
two  convolutions  are  combined  to  produce  an  edge  magnitude  to  be  compared  with  the  selected 
edge  threshold,  f  .  I'he  orientation  of  the  edge  can  also  be  calculated  from  the  outputs  of  the  two 
convolutions. 

I  he  Sohcl  and  Prewitt  edge  operators  are  applied  to  3  X  3  windows  of  pixels  and  are  imple¬ 
mented  by  two  patial  c  involutions  of  the  input  image  with  a  3  X  3  weighting  array.  Again,  the 
magnitude  and  orientation  of  the  edge  can  be  calculated  from  the  two  convolutions. 

Another  differential  edge  detection  is  the  Laplacian  operator,  however,  because  of  its  sen¬ 
sitivity  to  points  and  lines,  it  is  not  a  very  efficient  edge  detector.'  1 

I  he  weighting  array  s  used  for  the  Roberts.  Sobel.  and  Prewitt  edge  detectors  are  shown  in 
I  igure  53.  I  or  each  operator,  the  amplitude  ol  the  edge  is  given  by 

At  i.j  I  {  [  Y11  (  i,i  )  | :  +  |  Yv  ( i  ,i  1 1 ;  f  i  I  M 


O! 


At  i.j  >  =  |Yn  t  i.j  h  +  A  x  t  i.j  i; 


w  here  A  1 1  <  i . | »  is  the  convolution  ol  the  input  array  and  the  horizontal  weighting  array  and  Yx  ti.ji 
is  the  convolution  ol  input  array  and  the  vertical  weighting  array 


INPUT 


Figure  54  Implementation  of  Differential  Edge  Detectors  Using  PIPE  LSIC 


I  nr  the  Roberts  operator,  the  edge  orientation  is  given  In 


0  ( i .  j  >  = 


It 

4 


+  tan  1 


I  or  the  Sohel  and  Prewitt  operator,  the  edge  orientation  is  given  In 


(  I 


0  ( i  .  j )  =  tan 


( :o  i 


I  he  I’ll’l  I. SIC  ean  he  used  to  implement  the  Roberts.  Prewitt,  or  the  Sohel  edge  detector.  A 
Mod.  diagram  of  the  implementation  of  these  operators  is  shown  in  Figure  54.  I  lie  input  data  is 
loaded  as  three  sets  of  three  parallel  words.  Iliree  sample  peiiods  are  required  to  initially  load  the 
I’ll’!  I  SR  and  provide  the  First  output.  Succeeding  outputs  occur  each  sample  period  I  or  the 
Roberts  operator,  the  weighting  coefficients  would  he  rearranged  to  take  into  account  the  2  X  I 
weighting  arras 


e  Template  Matching  Edge  Detectors 

In  template-matching  edge  detection,  a  set  ol  weighting  arravs  corresponding  to  the  eight 
major  compass  directions  I  north,  northeast,  east,  etc  i  is  convolved  with  the  input  image.  I  lie  edge 
orientation  is  determined  h>  the  direction  producing  the  maximum  gradient  response  greater  tnan 
the  selected  threshold.  I  samples  of  template-matching  weighting  arravs  ale  given  in  I  igure  55  for 
the  compass-gradient.  Kirsch.  three-level,  and  live-level  template-matching  operators 


I  he  PIPI  l  Sit  can 

I  igure  5'>  I  lie-  inpul  arrav 


implement  the  template-matching  edge  detector  e 
is  loaded  as  three  selx  ol  three  parallel  wools  in  all 
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Figure  56.  Implementation  of  Template  Match  Edge  Detectors  Using  PUT.  LSK 


I  he  outputs  arc  compared  to  determine  the  maximum  response  and.  hence,  the  orientation  ol  the 
edge 

/  lulf;c  Detector  Performance  t  Italy  sis 

in  an  attempt  to  determine  the  relative  performance  of  the  edge  detectors  discussed  above,  an 
evaluation  has  been  performed  comparing  edge  response  as  a  function  of  actual  edge  orientation 
i  In  probability  of  correct  detection  as  a  function  of  the  probability  of  false  detection,  and  a  figure 
ol  merit  as  a  function  of  signal-to-noise  ratio  0 

Figure  5 7  shows  edge  detector  response  i amplitude  and  orientation i  as  a  function  of  .utuai 
edge  orientation."  I  or  the  Roberts.  Prewitt  and  Sohel  amplitude  response,  both  the  square  root 
sum  of  the  squares  and  the  sum  id  the  absolute  values  1 1  q nation  1  is  i  responses  are  show  n  \s  c an  In 
seen,  the  Prewitt.  Sohel.  and  template  matching  amplitude  response  is  relatively  invariant  to  edge 
orientation  while  the  Sobel  operator  has  the  most  linear  response  between  actual  edge  orientation 
and  detected  edge  orientation  For  the  template-matching  operators,  the  dillerence  between  actual 
and  detected  edge  orientation  is  large  because  the  template-matching  operators  measure  edge 
orientation  in  a  quantized  step. 

I  he  performance  ol  various  edge  detec  tors  m  the  presence  of  additive,  white  ( laussian  noise 
can  be  compared  using  parametric  curves  ol  corrected  detection  probability  versus  false  detection 
probihilitv  in  terms  of  the  detector  threshold'  I  igure  5X  shows  such  dines  lor  vertical  and 
diagonal  edges  for  both  differential  edge  detec  lots  and  template-matc  hing  detec  tois  w  ith  signal  to 
noise-  mtios  of  1  0  and  10.0  For  the  dillerelili.il  detectors,  the  Sobel  and  Prewitt  pvrlorm  belter 
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giire  57.  Edge-Gradient  Amplitude  Response  and  Detected -Edge  Orientation  as  E  unctions  of  Actual  Edge  Orientation  (I  row  Reference  6} 
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(i  Operational  (  haraeteristies 

Several  kev  parameters  define  the  upeiat . i  i-  .  the  I ’  1 1 1 1  I  s|<  tie  tvpe'Ot 

up  rations  t  serial  or  parallel,  sliding  «>i  in  mluii  •  •  i  :  ■  ■:  mpni  wind  m  Ini'  S  hits,  ete  i 

I  l’l<<  )M  ae i. ess  l mie :  ne  i  nmulator  t ime :  ami  t  he  ti "i  . ■  ■  ■  •  ;  n.;  .it  lat  lies 

Smee  the  IMPI  I  Sl(  must  a>.  cess  t he  on  In  |  I .  i  w  In  1 1  H  is  1  in  lenet  1 1  >  <1  the 

input  word  1  to  calculate  one  term  ot  the  .uiwu  i  :  !■  > !  1 1  •  > r  ;uu-.t  sum  lle-se  teinis  tlu 

time  required  to  output  an  answer  Inun  tin  Pll’l  I  sl< 
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Figure  61.  Sobel  and  Prewitt  Edge  Detection  Using  Programmable  Sum-of-Products  Breadboard 


r  =  max<  1 4  >  is  i  )  <::i 

where  f  ^  is  the  on-chip  hPRO.M  access  time,  and  K„n,  is  the  time  required  by  the  accumulator 
To  prevent  addressing  the  memory  incorrect!), .  the  time  between  load  pulses  must  also  satisls 
hquation  22. 

Recalling  that  the  number  of  input  strobe  pulses  between  load  pulses  determines  whether  a 
sliding  or  nonsliding  type  of  operation  is  performed,  the  time  between  strobe  pulses  i  l\ ,  R((H)  i  can 
be  given  as 


*  SI  KOI))  MAX  I  I  |  u  w,  ^  'lAltll' 


i  2.' i 


figure  62.  Template  Match  f.dge  Detection  Using  I’rogrannnable  Sum  of  Products  Breadboard 


where  N  is  llic  number  of  input  strobes  between  lo.nl  pulses.  I  ,  (Mb  is  tile  tune  between  loud  pulses 
us  determined  In  Initiation  22.  and  I  (  x  |  (  ,,  is  the  operating  time  lor  the  input  hits  lies. 

lo  maintain  proper  on-eltip  timing,  the  master  eloek  <  I,  |()(  K  i  must  be  greater  than  B 
times  the  load  trequeney.  or 


t  able  2  summari/es  Initiations  22  through  24  in  terms  ol  input  and  output  data  rates  tor 
different  types  of  operations,  word  length,  and  PPROM  access  time  (assuming  300-ns  accumulator 
timet  20-MII/.  input  latch,  and  20-Mllz  master  eloek  (design  goalsi 

In  the  serial  IS  ■  1  sliding  type  ot  operation,  one  strobe  pulse  is  needed  between  eu.i,  load 
pulse,  and  the  load  frequency  is  calculated  from  Initiation  22  to  he  I  25  Mil/  for  8-hit  data  and 
l.ii7  Mil/  for  t'-bit  data,  assuming  l()()-ns  1  PROM  access  tune.  I  or  serial  X  a  I  nonsliding  t\ pe  ol 
operations,  eight  strobe  pulses  are  ret) Hired  betwes  t  each  load  pulse:  therefore,  the  maximum  loatl 
frequencx  is  1.25  Mil/:  howe'er,  since  data  can  be  loaded  in  the  1’ 1 1*1  input  latches  independent 
of  the  parallel-to-serial  registers,  the  maximum  input  data  rate  is  determined  In  the  input  strobe 
frequency.  I  his  sail  be  calculated  from  1  quation  25  to  be  10  Mil/  for  both  8-  and  ('-hit  data  tor 
s  ■  I  nonsliding  t\ pe  ot  operations 

I  or  the  parallel  5  •  5  sliding  t\pe  ot  operation.  onh  one  strobe  pulse  is  needed  between 
load.  thus,  from  Initiations  22  and  25.  the  load  frequency  and  input  data  rate  (input  strobe  're¬ 
queue' )  can  be  calculated  lo  be  1.25  Mil/  lor  X-bit  data  and  I  .(>7  Mil/  for  <>-hit  data,  with  an 
f.l’KOM  access  time  of  IDO  ns.  for  parallel  5  ■  5  iioiislidmg  f\ pc  of  operations,  three  strobe  pulses 
are  required  between  load  pulses;  therefore,  the  input  data  rate  is  !(}  Mil/  for  both  8-  and  o-hit 
data.  I  he  effect  ol  I  I’ROM  assess  time  on  input  data  rate  is  also  shown  in  I  able  5. 

Ills'  output  data  rate  is  given  In  Initiation  22.  regardless  of  the  t\ pe  ot  operation,  and  is 
1 .25  Ml  1/  for  X-bit  data  and  I  .<>7  Mil/  for  b-hit  data  with  1  OO-its  I. I’ROM  access  tittle. 

lo  aehie'e  real-time  i  10-MII/i  operation  on  X-bit  data  with  50-us  I  PROM  assess  time,  lour 
Pll’l  I  SICs  stm  he  operated  in  parallel  as  shown  in  figure  t>5  for  transform  ealeulations  aitsl  I  igure 
<>4  for  neighborhood  operators.  In  both  implementations,  the  input  data  is  demultipicxcsl  In  the 
input  strobe  pulse  of  the  PIPI  LSK  into  the  four  paiallel  I’ll’l  s  and  the  tri-state  outputs  ate  multi¬ 
plexed.  using  the  enable  pulse  of  each  PIPI  LSK  .  A  block  diagram  ot  the  IMP!  LSK  Demonstia- 
t ion  Brasshoard  is  shown  in  f  igure  b5. 
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